Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models
arxiv.org·14h
t2x - a CLI tool for AI-first text operations
shruggingface.com·16h
Iterative multi-word anagram solver
boulter.com·21h
Rainbow Query Language
rbql.org·20h
Hugging Face Releases FineTranslations, a Trillion-Token Multilingual Parallel Text Dataset
infoq.com·3d
Event data
docs.gitlab.com·22h
Exploring Text Compression
denvaar.dev·16h
Loading...Loading more...